04. Fixed Q-Targets
Fixed Q-Targets
Fixed Q-Targets
## Summary
In Q-Learning, we update a guess with a guess, and this can potentially lead to harmful correlations. To avoid this, we can update the parameters w in the network \hat{q} to better approximate the action value corresponding to state S and action A with the following update rule:
\Delta w = \alpha \cdot \overbrace{( \underbrace{R + \gamma \max_a\hat{q}(S', a, w^-)}_{\rm {TD~target}} - \underbrace{\hat{q}(S, A, w)}_{\rm {old~value}})}^{\rm {TD~error}} \nabla_w\hat{q}(S, A, w)
where w^- are the weights of a separate target network that are not changed during the learning step, and (S, A, R, S') is an experience tuple.
Note: Ever wondered how the example in the video would look in real life? See: Carrot Stick Riding.
## Quiz
SOLUTION:
- The Deep Q-Learning algorithm uses two separate networks with identical architectures.
- The target Q-Network's weights are updated less often (or more slowly) than the primary Q-Network.
- Without fixed Q-targets, we would encounter a harmful form of correlation, whereby we shift the parameters of the network based on a constantly moving target.